378 research outputs found
Sampled Policy Gradient for Learning to Play the Game Agar.io
In this paper, a new offline actor-critic learning algorithm is introduced:
Sampled Policy Gradient (SPG). SPG samples in the action space to calculate an
approximated policy gradient by using the critic to evaluate the samples. This
sampling allows SPG to search the action-Q-value space more globally than
deterministic policy gradient (DPG), enabling it to theoretically avoid more
local optima. SPG is compared to Q-learning and the actor-critic algorithms
CACLA and DPG in a pellet collection task and a self play environment in the
game Agar.io. The online game Agar.io has become massively popular on the
internet due to intuitive game design and the ability to instantly compete
against players around the world. From the point of view of artificial
intelligence this game is also very intriguing: The game has a continuous input
and action space and allows to have diverse agents with complex strategies
compete against each other. The experimental results show that Q-Learning and
CACLA outperform a pre-programmed greedy bot in the pellet collection task, but
all algorithms fail to outperform this bot in a fighting scenario. The SPG
algorithm is analyzed to have great extendability through offline exploration
and it matches DPG in performance even in its basic form without extensive
sampling
Сербська книжка ХІХ століття у Львівській науковій бібліотеці ім. В. Стефаника (за матеріалами фонду відділу рідкісної книги)
UBUlink(opens in a new window)|Entitled full text(opens in a new window)|View at Publisher(opens in a new window)| In recent years the number and frequency of high-impact floods have increased and climate change effects are expected to increase flood risks even more. The European Union (EU) has recently established the Floods Directive as a framework for the assessment and management of these risks. The aim of this article is to explore factors that have hampered or stimulated the implementation process of the Floods Directive in the Netherlands, from its establishment in 2007 until January 2013. During this period, the first requirements of the Floods Directive had to be implemented, while the second and third obligations were to be in an advanced stage. Following a literature review of policy implementation theories and a content analysis of the Floods Directive, we have studied the implementation processes in the Dutch part of the Meuse and Rhine-West catchments. Perceptions of interviewees and survey respondents were used to identify influential factors. Our research shows that although the implementation process in the Netherlands is on schedule, it is iterative and complex. Various constraining and stimulating factors, affecting the implementation process, are distinguished. The article concludes with some suggestions for improving the further implementation of the Floods Directive
Bandit-Inspired Memetic Algorithms for Solving Quadratic Assignment Problems
In this paper we propose a novel algorithm called the Bandit-Inspired Memetic Algorithm (BIMA) and we have applied it to solve different large instances of the Quadratic Assignment Problem (QAP). Like other memetic algorithms, BIMA makes use of local search and a population of solutions. The novelty lies in the use of multi-armed bandit algorithms and assignment matrices for generating novel solutions, which will then be brought to a local minimum by local search. We have compared BIMA to multi-start local search (MLS) and iterated local search (ILS) on five QAP instances, and the results show that BIMA significantly outperforms these competitor
- …